Towards a Flexible Model of Word Meaning

نویسنده

  • Magnus Sahlgren
چکیده

We would like to build a model of semantic knowledge that have the capacity to acquire and represent semantic information that is ambiguous, vague and incomplete. Furthermore, the model should be able to acquire this knowledge in an unsupervised fashion from unstructured text data. Such a model needs to be both highly adaptive and very robust. In this submission, we will first try to identify some fundamental principles that a flexible model of word meaning must adhere to, and then present a possible implementation of these principles in a technique we call Random Indexing. We will also discuss current limitations of the technique and set the direction for future research. Introduction: the nature of meaning Meaning is a key concept in Natural Language Processing (NLP). Not only is it perhaps the most important theoretical object of study in linguistic analysis, but it also defines the higher-level linguistic competences that characterizes any proficient language user. Our models, theories and descriptions of natural language and of linguistic behavior are necessarily incomplete without an account of meaning. By the same token, any NLP application that aims to reach beyond the level of plain syntax needs to be able to deal with meaning computationally. However, few linguistic phenomena have proven to be as difficult to isolate and describe as that of meaning. To a certain extent, this is due to the nature of the phenomenon itself – it simply does not lend itself easily to explicit theoretical analysis. On the contrary, it has been, and still is, one of the most controversial subjects, and elusive concepts, in the philosophy of language and in linguistics. Furthermore, and as a consequence of this theoretical opaqueness, meaning does not seem to lend itself in any obvious way to computational treatment. Consequently, it is not clear what computational and representational strategies we need to employ in order to capture the phenomenon of meaning. Part of the reason for this apparent theoretical impasse is, as has already been hinted at, the inherent elusiveness of the phenomenon. Meaning is not something that is fixed and static; on the contrary, it is a dynamical phenomenon. Copyright c 2002, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Meaning changes over time – words lose their meanings, change their meanings and acquire new meanings as time goes by. For example, “computer” or “web site” do not mean the same thing today as it did a hundred years ago. Meaning also changes between speakers and between contexts of use – it is highly speaker and situation-dependent. For example, when a zoologist use the word “mouse,” it is likely to mean something different than when a computer scientist uses the same word. Of course, these are obvious examples. One could, if one were of a skeptical nature, argue that, in reality, meaning is in a constant state of flux, and that people rarely or never completely agree on what the meaning of any given word is. Furthermore, meanings are relative in the sense that it is only within a linguistic context that words can have meanings, and it is only in relation to other words that they actually mean anything. Words can (and usually do) mean different things within different linguistic contexts, and they can (and usually do) mean different things in relation to different words – this of course is the well known phenomenon of polysemy. One example is “bark,” which means (among other things; “bark” has 9 senses in WordNet1) one thing in conjunction with trees, and another in conjunction with dogs. Our model must be able to handle these features; it must be flexible and dynamical, and it must be able to represent information relativistically. Model – but model what? Natural language is characterized by ambiguity, vagueness and incompleteness. Most every utterance found in ordinary discourse is theoretically ambiguous. Of course, in practice, these theoretical ambiguities never give rise to real misunderstandings or communicative difficulties – even proper ambiguities like “bank” or “fly” seldom give rise to communicative difficulties in ordinary discourse, and even if proper ambiguities do arise, there is usually enough information in the linguistic (or extra-linguistic) context to help us disambiguate the word, phrase or sentence in question. This is usually also the case with the vagueness that characterizes ordinary language. Furthermore, incompleteness permeates http://www.cogsci.princeton.edu/ ̃wn ordinary linguistic discourse, for example through phonological and syntactical reduction. Now, ambiguity, vagueness and incompleteness are notoriously difficult to handle computationally. This, together with the fact that ordinary language use is very noisy, has led some theorists to argue that the incomplete, noisy and imprecise form of natural language obscures rather than elucidates its semantic content, and that we therefore need a more exact form of representation that obliterates the ambiguity and incompleteness of natural language. Historically, logic has often been cast in this role, with the idea that logic provides a more stringent and precise formalism that makes explicit the semantic information hidden in the imprecise form of natural language. Advocates of this view claim that we should not model natural language use, since it is noisy and imprecise, but rather language in the abstract. We believe that this path is sadly mistaken. Ambiguity, vagueness and incompleteness are essential properties of natural language – they are signs of communicative prosperity and of linguistic richness, and not signs of communicative malfunction and linguistic deterioration. It is a mistake to think that any semantic model, theory or description could transcend these features, since they are essential to language. In the words of Ludwig Wittgenstein: “It is clear that every sentence in our language ’is in order as it is”’ (Wittgenstein, 1953). This means that if our model is to be able to handle natural language semantics, it must be able to describe actual linguistic behavior – ordinary and concrete language use – and not language in the abstract. It must be able to handle ambiguity and incompleteness; it must be robust and adaptive. Vector-based representational schemes Representational issues are paramount to these considerations. We need a representational scheme that is flexible, relativistic, robust and adaptive. These are stern demands that call for radically new ideas. We believe that a certain breed of vector-space models might present an alternative in this direction. Vector-space models were first introduced into NLP to present an alternative to word-based methods in information retrieval (Salton and McGill, 1983). The problem with word-based methods is that they do not cope very well with synonymy and polysemy. Vector-space models overcome the problem with the variability of word usage by matching documents based on content rather than on mere words. This is done by representing the semantic content of words and documents as vectors in a multi-dimensional space, and then matching documents and queries based on the relative location of their respective vectors in the word/document space. This enables vector-space models to retrieve relevant documents that need not contain any of the words in the query. For example, a vector-space model may retrieve documents containing the word “ship” even if only the word “boat” was used in the query. The vectors are constructed by observing collocational statistics in the text data. The co-occurrence information is collected in a frequency matrix, where each row hosts a unique word and each column stands for a given linguistic context such as a document or a word. The cells of the matrix indicate the frequency of occurrence in, or co-occurrence with, documents or words. Latent Semantic Analysis, LSA (Landauer and Dumais, 1997), is an example of a vector-space model that uses document-based co-occurrence statistics and thus collects the frequency information in a words-by-documents co-occurrence matrix, while Hyperspace Analogue to Language, HAL (Lund et al., 1995), is an example of a vector-space model that employs word-based collocational information and thus represents the data in a words-by-words co-occurrence matrix. Either way the data is collected, the matrix containing raw co-occurrence frequencies will not be very informative, and it will become computationally ungainly with large vocabularies and large document collections, so to reduce the problem space, and to excavate the semantic information that is latent in the frequency matrix, vector-space models generally use some form of dimension reduction to reduce the dimensionality (and, implicitly, thereby also the noise) of the co-occurrence matrix. There are a few different approaches to performing this dimension reduction step: LSA first normalizes the co-occurrence frequencies and then uses a mathematical technique called singular value decomposition to reduce the dimensionality of the matrix, and HAL uses a “column variance method,” which consists in discarding the columns with lowest variance. The vector calculations are completed when the dimension reduction step is done. The dimensionality of the cooccurrence matrix has been reduced to a fraction of its original size, and words are thus represented in the final matrix by semantic vectors of n dimensionality. LSA is reported to be optimal at n = 300 (Landauer and Dumais, 1997), and HAL at n = 200 (Lund et al., 1995). The vector-space can now be used to calculate semantic similarity between words, which is done by calculating the similarity between the vectors. Commonly used measures of vector similarity are, for example, the cosine measure (of the angles between the vectors), Euclidian distance and the City Block metric. The merit of vector-spaces Vector-space models have some desirable features. To begin with, we may note that the semantic content of the vectorspace does not reside in the vectors as such, but rather in the relations between the vectors; the vectors do not mean anything in isolation from all the other vectors in the space. The semantic space can thus be thought of as demarcating the specific linguistic context, or, to borrow a term from Wittgenstein, the particular language-game (Wittgenstein, 1953), described by the training data. This form of representation has the appealing property of being relativistic by nature since the only vital property of the space is the relative direction of the vectors. Another appealing property of vector-space models is that the semantic information is extracted automatically, in an unsupervised fashion from unstructured text data. It requires no preprocessing of the data, and it involves no human interaction. We use the label “vector-based semantic analysis” do denote the practice of using mere statistical regularities in the text data – usually co-occurrence information – to automatically construct the vectors and the vector space. No prior knowledge of the text data is assumed, making the models easy to apply to text data with different topical and structural properties. Consequently, vector-space models are inherently adaptive when applied to new domains, since the dynamics of the semantic space will reflect the semantics of the training data. This means that different domains will produce different semantic spaces, with different semantic relations between different words. For example, if we train the model on a zoological database, “mouse” will most certainly be correlated with other words referring to, for example, small, furry animals or rodents, while if we train the model on documents with computer-related subjects, “mouse” will presumably be correlated with other words referring to, for example, computer hardware. As a matter for empirical validation, this feature presumably also makes the models easily applicable to different languages. Finally, and most importantly, vector-space models do not require, and do not provide, any answer to the question what meaning is. The models merely reflect how words are actually used in a sample of natural language behavior. The only philosophical assumption we need to make is the hypothesis that two words are semantically related if they are used in a similar manner, but this assumption does not say anything about the ontological status of meanings – it merely assumes that the use is a symptom of meaning. The draw-backs of vector-spaces Traditional vector-space methodologies have some, more or less serious, shortcomings, and they all, in some way, have to do with the dimension reduction step. Unfortunately, this step is necessary not only to reduce the noise and thereby uncover the latent semantic structures in the original frequency counts, but also because “localist” matrices of this type (i.e. matrices where each column represents a unique linguistic entity) will become computationally intractable for large text data with large vocabularies. The “efficiency threshold,” if we may call it that, is lower for techniques using words-by-words matrices than for techniques using document-based co-occurrence statistics,2 but the problem of scalability is common for all techniques using “localist” matrices. Furthermore, dimension reduction is a one-time operation, with a rigid result. This means that, once a dimension reduction has been performed, it is impossible to add new data to the model. If new data is encountered, the entire space has to be rebuilt from scratch. This of course seriously impedes the flexibility of the model. Also, dimension reduction techniques such as singular value decomposition tend to be computationally very costly, with regards to both memory and execution time. Even if the dimension reducOn the other hand, the collocational statistics captured in a words-by-words matrix will be both quantitatively and qualitatively more informative, since the number of co-occurrence events will be higher than when using documents to define the cooccurrence region tion step is a one-time cost, it is still a considerable cost, which affects the efficiency of the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible word meaning in embodied agents

Learning the meanings of words requires coping with referential uncertainty – a learner hearing a novel word cannot be sure which aspects or properties of the referred object or event comprise the meaning of the word. Data from developmental psychology suggests that human learners grasp the important aspects of many novel words after only a few exposures, a phenomenon known as fast mapping. Tra...

متن کامل

Acquiring the essence of truth in educational impacts of place formation

Considering the problems facing contemporary architecture in Iran, a better understanding of the meaning of architecture has become necessary. Architecture, like language, defines and facilitates the relationship between Man and his environment. The word architecture both in Arabic (Amara) and in Latin (Architecture), attempts to define the attributes of the maker before determining the char...

متن کامل

What Is Word Meaning, Really? (And How Can Distributional Models Help Us Describe It?)

In this paper, we argue in favor of reconsidering models for word meaning, using as a basis results from cognitive science on human concept representation. More specifically, we argue for a more flexible representation of word meaning than the assignment of a single best-fitting dictionary sense to each occurrence: Either use dictionary senses, but view them as having fuzzy boundaries, and assu...

متن کامل

Coping with Combinatorial Uncertainty in Word Learning: a Flexible Usage-based Model

Agents in the process of bootstrapping a shared lexicon face immense uncertainty. The problem that an agent cannot point to meaning but only to objects, represents one of the core aspects of the problem. Even with a straightforward representation of meaning, such as a set of boolean features, the hypothesis space scales exponential in the number of primitive features. Furthermore, data suggests...

متن کامل

Historical semantics attributes of Prophet Ibrahim (AS) in the Holy Quran (Case Study Hanif words, certainly, Avah, Aime, model)

So how journey of Prophet Ibrahim (AS) is a specific manifestation in the Qur'an. So far, research-lot about some of them have been completed. Surveys show that about Hanif etymology approach because it signified change, have been in vain. In certain cases, the root of it and in the case of the Islamic Ummah to the confluence of two words "my nation" and connect the two, based on the abstract m...

متن کامل

The Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)

The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002